-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No more segfault during regex parsing #451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was the .mata
files parser, I would say it might not be the intention to trim every automaton implicitly. You could not, for example, parse some template of an automaton which you would later modify. But for regex parser, this should not be the case, and you always want to get the most succinct representation of the regex. Under this assumption, the logic should be fine.
However, since we have effectively removed the renumber states function, I would just use nfa::Nfa::trim()
directly at the place of calling re2parser::renumber_states()
, but maybe even keep the original re2parser::renumber_states()
(and fix it), or remove it entirely.
Have you checked the performance impact of this change? Was there any significant change in the performance of the regex parser?
After discussing this with @koniksedy in person, we decided to remove the |
By the way, does it really fix the linked issue? It might be good to add one (or more) of the regexes from #437 to our unit tests suite to ensure this issue is not encountered again. |
Performance resultssingle_param_jobs
double_param_jobs
The performance drop is negligible. A set of regular expressions
Cumulative parsing time for this PR: ~37[ms] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR:
mata::nfa::Limits::max_state
).